80 research outputs found
Teaching Inverse Reinforcement Learners via Features and Demonstrations
Learning near-optimal behaviour from an expert's demonstrations typically
relies on the assumption that the learner knows the features that the true
reward function depends on. In this paper, we study the problem of learning
from demonstrations in the setting where this is not the case, i.e., where
there is a mismatch between the worldviews of the learner and the expert. We
introduce a natural quantity, the teaching risk, which measures the potential
suboptimality of policies that look optimal to the learner in this setting. We
show that bounds on the teaching risk guarantee that the learner is able to
find a near-optimal policy using standard algorithms based on inverse
reinforcement learning. Based on these findings, we suggest a teaching scheme
in which the expert can decrease the teaching risk by updating the learner's
worldview, and thus ultimately enable her to find a near-optimal policy.Comment: NeurIPS'2018 (extended version
Noisy Submodular Maximization via Adaptive Sampling with Applications to Crowdsourced Image Collection Summarization
We address the problem of maximizing an unknown submodular function that can
only be accessed via noisy evaluations. Our work is motivated by the task of
summarizing content, e.g., image collections, by leveraging users' feedback in
form of clicks or ratings. For summarization tasks with the goal of maximizing
coverage and diversity, submodular set functions are a natural choice. When the
underlying submodular function is unknown, users' feedback can provide noisy
evaluations of the function that we seek to maximize. We provide a generic
algorithm -- \submM{} -- for maximizing an unknown submodular function under
cardinality constraints. This algorithm makes use of a novel exploration module
-- \blbox{} -- that proposes good elements based on adaptively sampling noisy
function evaluations. \blbox{} is able to accommodate different kinds of
observation models such as value queries and pairwise comparisons. We provide
PAC-style guarantees on the quality and sampling cost of the solution obtained
by \submM{}. We demonstrate the effectiveness of our approach in an
interactive, crowdsourced image collection summarization application.Comment: Extended version of AAAI'16 pape
Stochastic Privacy
Online services such as web search and e-commerce applications typically rely
on the collection of data about users, including details of their activities on
the web. Such personal data is used to enhance the quality of service via
personalization of content and to maximize revenues via better targeting of
advertisements and deeper engagement of users on sites. To date, service
providers have largely followed the approach of either requiring or requesting
consent for opting-in to share their data. Users may be willing to share
private information in return for better quality of service or for incentives,
or in return for assurances about the nature and extend of the logging of data.
We introduce \emph{stochastic privacy}, a new approach to privacy centering on
a simple concept: A guarantee is provided to users about the upper-bound on the
probability that their personal data will be used. Such a probability, which we
refer to as \emph{privacy risk}, can be assessed by users as a preference or
communicated as a policy by a service provider. Service providers can work to
personalize and to optimize revenues in accordance with preferences about
privacy risk. We present procedures, proofs, and an overall system for
maximizing the quality of services, while respecting bounds on allowable or
communicated privacy risk. We demonstrate the methodology with a case study and
evaluation of the procedures applied to web search personalization. We show how
we can achieve near-optimal utility of accessing information with provable
guarantees on the probability of sharing data
Information Gathering with Peers: Submodular Optimization with Peer-Prediction Constraints
We study a problem of optimal information gathering from multiple data
providers that need to be incentivized to provide accurate information. This
problem arises in many real world applications that rely on crowdsourced data
sets, but where the process of obtaining data is costly. A notable example of
such a scenario is crowd sensing. To this end, we formulate the problem of
optimal information gathering as maximization of a submodular function under a
budget constraint, where the budget represents the total expected payment to
data providers. Contrary to the existing approaches, we base our payments on
incentives for accuracy and truthfulness, in particular, {\em peer-prediction}
methods that score each of the selected data providers against its best peer,
while ensuring that the minimum expected payment is above a given threshold. We
first show that the problem at hand is hard to approximate within a constant
factor that is not dependent on the properties of the payment function.
However, for given topological and analytical properties of the instance, we
construct two greedy algorithms, respectively called PPCGreedy and
PPCGreedyIter, and establish theoretical bounds on their performance w.r.t. the
optimal solution. Finally, we evaluate our methods using a realistic crowd
sensing testbed.Comment: Longer version of AAAI'18 pape
Interactive Teaching Algorithms for Inverse Reinforcement Learning
We study the problem of inverse reinforcement learning (IRL) with the added
twist that the learner is assisted by a helpful teacher. More formally, we
tackle the following algorithmic question: How could a teacher provide an
informative sequence of demonstrations to an IRL learner to speed up the
learning process? We present an interactive teaching framework where a teacher
adaptively chooses the next demonstration based on learner's current policy. In
particular, we design teaching algorithms for two concrete settings: an
omniscient setting where a teacher has full knowledge about the learner's
dynamics and a blackbox setting where the teacher has minimal knowledge. Then,
we study a sequential variant of the popular MCE-IRL learner and prove
convergence guarantees of our teaching algorithm in the omniscient setting.
Extensive experiments with a car driving simulator environment show that the
learning progress can be speeded up drastically as compared to an uninformative
teacher.Comment: IJCAI'19 paper (extended version
Learning User Preferences to Incentivize Exploration in the Sharing Economy
We study platforms in the sharing economy and discuss the need for
incentivizing users to explore options that otherwise would not be chosen. For
instance, rental platforms such as Airbnb typically rely on customer reviews to
provide users with relevant information about different options. Yet, often a
large fraction of options does not have any reviews available. Such options are
frequently neglected as viable choices, and in turn are unlikely to be
evaluated, creating a vicious cycle. Platforms can engage users to deviate from
their preferred choice by offering monetary incentives for choosing a different
option instead. To efficiently learn the optimal incentives to offer, we
consider structural information in user preferences and introduce a novel
algorithm - Coordinated Online Learning (CoOL) - for learning with structural
information modeled as convex constraints. We provide formal guarantees on the
performance of our algorithm and test the viability of our approach in a user
study with data of apartments on Airbnb. Our findings suggest that our approach
is well-suited to learn appropriate incentives and increase exploration on the
investigated platform.Comment: Longer version of AAAI'18 paper. arXiv admin note: text overlap with
arXiv:1702.0284
- …